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Abstract. - Personal income distribution in the USA has a well-defined two-class structure. 
The majority of population (97-99%) belongs to the lower class characterized by the exponential 
Boltzmann-Gibbs ("thermal") distribution, whereas the upper class (1-3% of population) has a 
Pareto power-law ("superthermal") distribution. By analyzing income data for 1983-2001, we 
show that the "thermal" part is stationary in time, save for a gradual increase of the effective 
temperature, whereas the "superthermal" tail swells and shrinks following the stock market. 
fj ' We discuss the concept of equilibrium inequality in a society, based on the principle of maximal 

entropy, and quantitatively show that it applies to the majority of population. 
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OO I Attempts to apply the methods of exact sciences, such as physics, to describe a society 

^^ ' have a long history [1]. At the end of the 19th century, Italian physicist, engineer, economist, 

and sociologist Vilfredo Pareto suggested that income distribution in a society is described 

^^ ' by a power law [2] . Modern data indeed confirm that the upper tail of income distribution 

(^ I follows the Pareto law [3-7]. However, the majority of population does not belong there, so 

characterization and understanding of their income distribution remains an open problem. 

2 ! Dragulescu and Yakovenko [8] proposed that the equilibrium distribution should follow an 

exponential law analogous to the Boltzmann-Gibbs distribution of energy in statistical physics. 

The first factual evidence for the exponential distribution of income was found in Rcf. [9]. 

2 ' Coexistence of the exponential and power-law parts of the distribution was recognized in 

Q I Ref. [10]. However, these papers, as well as Ref. [11], studied the data only for a particular year. 

O ' Here we analyze temporal evolution of the personal income distribution in the USA during 

1983-2001. We show that the US society has a well-defined two-class structure. The majority 

of population (97-99%) belongs to the lower class and has a very stable in time exponential 

. , ("thermal") distribution of income. The upper class (1-3% of population) has a power- law 

j^ ' ("superthermal") distribution, whose parameters significantly change in time with the rise 

and fall of stock market. Using the principle of maximal entropy, we discuss the concept 

of equilibrium inequality in a society and quantitatively show that it applies to the bulk of 

population. Most of academic and government literature on income distribution and inequality 

[12-15] does not attempt to fit the data by a simple formula. When fits are performed, usually 

the log-normal distribution [16] is used for the lower part of the distribution [5-7]. Only 

© EDP Sciences 



Ti 



X 



EUROPHYSICS LETTERS 



Adjusted gross income in 2001 dollars, k$ 

200.85 




Adjusted gross income in 2001 dollars, k$ 




Fig. 2 



1 10 100 

Rescaled adjusted gross income 



Fig. 1 - Cumulative probability C{r) and probability density P{r) plotted in the log-linear scale vs. 
r/T, the annual personal income r normalized by the average income T in the exponential part of 
the distribution. The IRS data points are for 1983-2001, and the columns of numbers give the values 
of T for the corresponding years. 

Fig. 2 - Log-log plots of the cumulative probability C{r) vs. r/T for a wider range of income r. 



recently the exponential distribution started to be recognized in income studies [17, 18], and 
models showing formation of two classes started to appear [19,20]. 

Let us introduce the probability density P{r), which gives the probability P{r) dr to have 
income in the interval {r,r + dr). The cumulative probability C{r) = J dr'P{r') is the 
probability to have income above r, C(0) = 1. By analogy with the Boltzmann-Gibbs dis- 
tribution in statistical physics [8,9], we consider an exponential function P{r) oc exp(— r/T), 
where T is a parameter analogous to temperature. It is equal to the average income T = 
(r) = L dr'r'P{r'), and we call it the "income temperature." When P{r) is exponential, 
C{r) oc exp(— r/T) is also exponential. Similarly, for the Pareto power law P{r) oc l/r"+^, 
C{r) oc 1/r" is also a power law. 

We analyze the data [21] on personal income distribution compiled by the Internal Revenue 
Service (IRS) from the tax returns in the USA for the period 1983-2001 (presently the latest 
available year). The publicly available data are already preprocessed by the IRS into bins 
and effectively give the cumulative distribution function C{r) for certain values of r. First we 
make the plots of log C(r) vs. r (the log-linear plots) for each year. We find that the plots are 
straight lines for the lower 97-98% of population, thus confirming the exponential law. From 
the slopes of these straight lines, we determine the income temperatures T for each year. In 
Fig. ^ we plot C(r) and P{r) vs. r/T (income normalized to temperature) in the log-linear 
scale. In these coordinates, the data sets for different years collapse onto a single straight 
line. (In Fig. ^ the data lines for 1980s and 1990s are shown separately and offset vertically.) 
The columns of numbers in Fig. ^ list the values of the annual income temperature T for the 
corresponding years, which changes from 19 k$ in 1983 to 40 k$ in 2001. The upper horizontal 
axis in Fig.Qshows income r in k$ for 2001. 

In Fig. [3 we show the same data in the log-log scale for a wider range of income r, up to 
about 300T. Again we observe that the sets of points for different years collapse onto a single 
exponential curve for the lower part of the distribution, when plotted vs. r/T. However, above 
a certain income r^, « 4T, the distribution function changes to a power law, as illustrated by 
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the straight hnes in the log- log scale of Fig.|21 Thus we observe that income distribution in the 
USA has a well-defined two-class structure. The lower class (the great majority of population) 
is characterized by the exponential, Boltzmann-Gibbs distribution, whereas the upper class 
(the top few percent of population) has the power-law, Pareto distribution. The intersection 
point of the exponential and power-law curves determines the income r* separating the two 
classes. The collapse of data points for different years in the lower, exponential part of the 
distribution in Figs. Q] and |21 shows that this part is very stable in time and, essentially, does 
not change at all for the last 20 years, save for a gradual increase of temperature T in nominal 
dollars. We conclude that the majority of population is in statistical equilibrium, analogous 
to the thermal equilibrium in physics. On the other hand, the points in the upper, power-law 
part of the distribution in Fig. |21 do not collapse onto a single line. This part significantly 
changes from year to year, so it is out of statistical equilibrium. A similar two-part structure 
in the energy distribution is often observed in physics, where the lower part of the distribution 
is called "thermal" and the upper part "superthermal" [22]. 

Temporal evolution of the parameters T and r* is shown in Fig. We observe that the 
average income T (in nominal dollars) was increasing gradually, almost linearly in time, and 
doubled in the last twenty years. In Fig.|3| we also show the inflation coefficient (the consumer 
price index CPI from Ref. [23]) compounded on the average income of 1983. For the twenty 
years, the inflation factor is about 1.7, thus most, if not all, of the nominal increase in T is 
inflation. Also shown in Fig. O is the nominal gross domestic product (GDP) per capita [23], 
which increases in time similarly to T and CPI. The ratio r^,/T varies between 4.8 and 3.2 in 
Fig. El 

In Fig. ^ we show how the parameters of the Pareto tail C{r) oc 1/r" change in time. 
Curve (a) shows that the power-law index a varies between 1.8 and 1.4, so the power law is 
not universal. Because a power law decays with r more slowly than an exponential function, 
the upper tail contains more income than we would expect for a thermal distribution, hence 
we call the tail "superthermal" [22]. The total excessive income in the upper tail can be 
determined in two ways: as the integral J dr'r' P{r') of the power-law distribution, or as 
the difference between the total income in the system and the income in the exponential part. 
Curves (c) and (b) in Fig. 0] show the excessive income in the upper tail, as a fraction / of the 
total income in the system, determined by these two methods, which agree with each other 
reasonably well. We observe that / increased by the factor of 5 between 1983 and 2000, from 
4% to 20%, but decreased in 2001 after the crash of the US stock market. For comparison, 
curve (e) in Fig.^shows the stock market index S&P 500 divided by inflation. It also increased 
by the factor of 5.5 between 1983 and 1999, and then dropped after the stock market crash. 
We conclude that the swelling and shrinking of the upper income tail is correlated with the 
rise and fall of the stock market. Similar results were found for the upper income tail in Japan 
in Ref. [4]. Curve (d) in Fig.^shows the fraction of population in the upper tail. It increased 
from 1% in 1983 to 3% in 1999, but then decreased after the stock market crash. Notice, 
however, that the stock market dynamics had a much weaker effect on the average income T 
of the lower, "thermal" part of income distribution shown in Fig. |3| 

For discussion of income inequality, the standard practice is to construct the so-called 
Lorenz curve [12]. It is defined parametrically in terms of the two coordinates x(r) and 
y{r) depending on the parameter r, which changes from to oo. The horizontal coordinate 
x{r) = L dr'P(r') is the fraction of population with income below r. The vertical coordinate 
y{r) — J^ dr'r' P{r')/ j„ dr'r'P{r') is the total income of this population, as a fraction of 
the total income in the system. Fig. El shows the data points for the Lorenz curves in 1983 
and 2000, as computed by the IRS [15]. For a purely exponential distribution of income 
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Fig. 3 - Temporal evolution of various parameters characterizing income distribution. 

Fig. 4 - (a) The Pareto index a of the power- law tail C{r) oc 1/r". (b) The excessive income in 
the Pareto tail, as a fraction / of the total income in the system, obtained as the difference between 
the total income and the income in the exponential part of the distribution, (c) The tail income 
fraction /, obtained by integrating the Pareto power law of the tail, (d) The fraction of population 
belonging to the Pareto tail, (e) The stock-market index S&P 500 divided by the inflation coefficient 
and normalized to 1 in 1983. 



P{r) oc exp(— r/T), the formula y — x + {1 — x) ln(l — x) for the Lorenz curve was derived in 
Ref. [9] . This formula describes income distribution reasonably well in the first approximation 
[9], but visible deviations exist. These deviations can be corrected by taking into account that 
the total income in the system is higher than the income in the exponential part, because of 
the extra income in the Pareto tail. Correcting for this difference in the normalization of y, 
we find a modified expression [11] for the Lorenz curve 



y={l- f)[x + (1 - x) ln(l - x)] + /e(x - 1), 



(1) 



where / is the fraction of the total income contained in the Pareto tail, and 8(a; — 1) is the 
step function equal to for a; < 1 and 1 for x >1. The Lorenz curve Q experiences a vertical 
jump of the height / at a; = 1, which reflects the fact that, although the fraction of population 
in the Pareto tail is very small, their fraction / of the total income is significant. It does not 
matter for Eq. |^ whether the extra income in the upper tail is described by a power law or 
another slowly decreasing function P(r). The Lorenz curves, calculated using Eq. Q with 
the values of / from Fig. 0] fit the IRS data points very well in Fig. [S] 

The deviation of the Lorenz curve from the diagonal in Fig. |21 is a certain measure of 
income inequality. Indeed, if everybody had the same income, the Lorenz curve would be the 
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US, IRS data for 1983 and 2000 
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Fig. 5 - Main panel: Lorenz plots for income distribution in 1983 and 2000. The data points are from 
the IRS [15], and the theoretical curves represent Eq. Q with / from Fig. 2] Inset: The closed circles 
are the IRS data [15] for the Gini coefficient G, and the open circles show the theoretical formula 

G = (1 + /)/2. 



diagonal, because the fraction of income would be proportional to the fraction of population. 
The standard measure of income inequality is the so-called Gini coefficient < G < 1, which 
is defined as the area between the Lorenz curve and the diagonal, divided by the area of the 
triangle beneath the diagonal [12]. It was calculated in Ref. [9] that G = 1/2 for a purely 
exponential distribution. Temporal evolution of the Gini coefficient, as determined by the 
IRS [15], is shown in the inset of Fig. [5| In the first approximation, G is quite close to the 
theoretically calculated value 1/2. The agreement can be improved by taking into account 
the Pareto tail, which gives G = (1 + /)/2 for Eq. Q. The inset in Fig. [S] shows that this 
formula very well fits the IRS data for the 1990s with the values of / taken from Fig. ^ We 
observe that income inequality was increasing for the last 20 years, because of swelling of the 
Pareto tail, but started to decrease in 2001 after the stock market crash. The deviation of G 
below 1/2 in the 1980s cannot be captured by our formula. The data points for the Lorenz 
curve in 1983 lie slightly above the theoretical curve in Fig.|31 which accounts for G < 1/2. 

Thus far we discussed the distribution of individual income. An interesting related question 
is the distribution of family income P2ir). If both spouses are earners, and their incomes are 
distributed exponentially as Pi{r) ex exp(— r/T), then 



P2{r) = / rfr'Pi(r')Pi(r — r') oc r exp(— r/r). 
Jq 



(2) 



Eq. (0) is in a good agreement with the family income distribution data from the US Census 
Bureau [9]. In Eq. (0), we assumed that incomes of spouses are uncorrelated. This assumption 
was verified by comparison with the data in Ref. [11]. The Gini coefhcient for family income 
distribution Q was found to be G = 3/8 = 37.5% [9], in agreement with the data. Moreover, 
the calculated value 37.5% is close to the average G for the developed capitalist countries of 
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North America and Western Europe, as determined by the World Bank [11]. 

On the basis of the analysis presented above, we propose a concept of the equilibrium 
inequality in a society, characterized by G = 1/2 for individual income and G = 3/8 for family 
income. It is a consequence of the exponential Boltzmann-Gibbs distribution in thermal 
equilibrium, which maximizes the entropy S = J dr P{r) In P(r) of a distribution P(r) under 
the constraint of the conservation law (r) = L drP{r)r = const. Thus, any deviation of 
income distribution from the exponential one, to either less inequality or more inequality, 
reduces entropy and is not favorable by the second law of thermodynamics. Such deviations 
may be possible only due to non-equilibrium effects. The presented data show that the great 
majority of the US population is in thermal equilibrium. 

Finally, we briefly discuss how the two-class structure of income distribution can be ra- 
tionalized on the basis of a kinetic approach, which deals with temporal evolution of the 
probability distribution P{r,t). Let us consider a diffusion model, where income r changes 
by Ar over a period of time At. Then, temporal evolution of P(r, t) is described by the 
Fokker-Planck equation [24] 

For the lower part of the distribution, it is reasonable to assume that Ar is independent of r. 
In this case, the coefficients A and B are constants. Then, the stationary solution dtP = of 
Eq. lO gives the exponential distribution [8] P{r) oc exp(— r/T) with T = B/A. Notice that a 
meaningful solution requires that A > 0, i.e. (Ar) < in Eq. ^. On the other hand, for the 
upper tail of income distribution, it is reasonable to expect that Ar oc r (the Gibrat law [16]), 
so A = ar and B = br^. Then, the stationary solution dtP = of Eq. Q gives the power-law 
distribution P{r) oc l/r"+^ with a = 1 + a/b. The former process is additive diffusion, where 
income changes by certain amounts, whereas the latter process is multiplicative diffusion, 
where income changes by certain percentages. The lower class income comes from wages 
and salaries, so the additive process is appropriate, whereas the upper class income comes 
from investments, capital gains, etc., where the multiplicative process is applicable. Rcf. [4] 
quantitatively studied income kinetics using tax data for the upper class in Japan and found 
that it is indeed governed by a multiplicative process. The data on income mobility in the 
USA are not readily available publicly, but are accessible to the Statistics of Income Research 
Division of the IRS. Such data would allow to verify the conjectures about income kinetics. 

The exponential probability distribution P{r) oc exp(— r/T) is a monotonous function of 
r with the most probable income r = 0. The probability densities shown in Fig. ^ agree 
reasonably well with this simple exponential law. However, a number of other studies found 
a nonmonotonous P{r) with a maximum at r ^ and P(0) — 0. These data were fitted 
by the log-normal [5-7] or the gamma distribution [18,19,25]. The origin of the discrepancy 
in the low-income data between our work and other papers is not completely clear at this 
moment. The following factors may possibly play a role. First, one should be careful to 
distinguish between personal income and group income, such as family and household income. 
As Eq. Q shows, the later is given by the gamma distribution even when the personal income 
distribution is exponential. Very often statistical data are given for households and mix 
individual and group income distributions (see more discussion in Ref. [9]). Second, the 
data from tax agencies and census bureaus may differ. The former data are obtained from 
tax declarations of all taxable population, whereas the later data from questionnaire surveys 
of a limited sample of population. These two methodologies may produce different results, 
particularly for low incomes. Third, it is necessary to distinguish between distributions of 
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money [8,25,26], wealth [19,27], and income. They are, presumably, closely related, but may 
be different in some respects. Fourth, the low-income probability density may be different in 
the USA and in other countries because of different social security policies. All these questions 
require careful investigation in future work. We can only say that the data sets analyzed in 
this paper and our previous papers are well described by a simple exponential function for 
the whole lower class. This does not exclude a possibility that other functions can also fit 
the data [28]. However, the exponential law has only one fitting parameter T, whereas log- 
normal, gamma, and other distributions have two or more fitting parameters, so they are less 
parsimonious. 
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